Tidy Tuesday

(or all night Thursday and all day Friday)

For this Tidy I will be using the most recent dataset. I was planning to start with older ones, but being a single parent, I know all too well what childcare costs.

Let’s Go

Load Libraries

library(here)
library(ggplot2)# for graph
library(gganimate) #to animate
library(readr) #to read data filefile
library(dplyr) #to tribble data

Next, we have to load our dataset

Load Dataset

<!-Good Lord, this data set is huge!!->

childcare_costs <- read.csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-05-09/childcare_costs.csv')

glimpse(childcare_costs) # VERY large data set
## Rows: 34,567
## Columns: 61
## $ county_fips_code          <int> 1001, 1001, 1001, 1001, 1001, 1001, 1001, 10…
## $ study_year                <int> 2008, 2009, 2010, 2011, 2012, 2013, 2014, 20…
## $ unr_16                    <dbl> 5.42, 5.93, 6.21, 7.55, 8.60, 9.39, 8.50, 7.…
## $ funr_16                   <dbl> 4.41, 5.72, 5.57, 8.13, 8.88, 10.31, 9.18, 8…
## $ munr_16                   <dbl> 6.32, 6.11, 6.78, 7.03, 8.29, 8.56, 7.95, 6.…
## $ unr_20to64                <dbl> 4.6, 4.8, 5.1, 6.2, 6.7, 7.3, 6.8, 5.9, 4.4,…
## $ funr_20to64               <dbl> 3.5, 4.6, 4.6, 6.3, 6.4, 7.6, 6.8, 6.1, 4.6,…
## $ munr_20to64               <dbl> 5.6, 5.0, 5.6, 6.1, 7.0, 7.0, 6.8, 5.9, 4.3,…
## $ flfpr_20to64              <dbl> 68.9, 70.8, 71.3, 70.2, 70.6, 70.7, 69.9, 68…
## $ flfpr_20to64_under6       <dbl> 66.9, 63.7, 67.0, 66.5, 67.1, 67.5, 65.2, 66…
## $ flfpr_20to64_6to17        <dbl> 79.59, 78.41, 78.15, 77.62, 76.31, 75.91, 75…
## $ flfpr_20to64_under6_6to17 <dbl> 60.81, 59.91, 59.71, 59.31, 58.30, 58.00, 57…
## $ mlfpr_20to64              <dbl> 84.0, 86.2, 85.8, 85.7, 85.7, 85.0, 84.2, 82…
## $ pr_f                      <dbl> 8.5, 7.5, 7.5, 7.4, 7.4, 8.3, 9.1, 9.3, 9.4,…
## $ pr_p                      <dbl> 11.5, 10.3, 10.6, 10.9, 11.6, 12.1, 12.8, 12…
## $ mhi_2018                  <dbl> 58462.55, 60211.71, 61775.80, 60366.88, 5915…
## $ me_2018                   <dbl> 32710.60, 34688.16, 34740.84, 34564.32, 3432…
## $ fme_2018                  <dbl> 25156.25, 26852.67, 27391.08, 26727.68, 2796…
## $ mme_2018                  <dbl> 41436.80, 43865.64, 46155.24, 45333.12, 4427…
## $ total_pop                 <int> 49744, 49584, 53155, 53944, 54590, 54907, 55…
## $ one_race                  <dbl> 98.1, 98.6, 98.5, 98.5, 98.5, 98.6, 98.7, 98…
## $ one_race_w                <dbl> 78.9, 79.1, 79.1, 78.9, 78.9, 78.3, 78.0, 77…
## $ one_race_b                <dbl> 17.7, 17.9, 17.9, 18.1, 18.1, 18.4, 18.6, 18…
## $ one_race_i                <dbl> 0.4, 0.4, 0.3, 0.2, 0.3, 0.3, 0.4, 0.4, 0.4,…
## $ one_race_a                <dbl> 0.4, 0.6, 0.7, 0.7, 0.8, 1.0, 0.9, 1.0, 0.8,…
## $ one_race_h                <dbl> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1,…
## $ one_race_other            <dbl> 0.7, 0.7, 0.6, 0.5, 0.4, 0.7, 0.7, 0.9, 1.4,…
## $ two_races                 <dbl> 1.9, 1.4, 1.5, 1.5, 1.5, 1.4, 1.3, 1.6, 2.0,…
## $ hispanic                  <dbl> 1.8, 2.0, 2.3, 2.4, 2.4, 2.5, 2.5, 2.6, 2.6,…
## $ households                <int> 18373, 18288, 19718, 19998, 19934, 20071, 20…
## $ h_under6_both_work        <int> 1543, 1475, 1569, 1695, 1714, 1532, 1557, 13…
## $ h_under6_f_work           <int> 970, 964, 1009, 1060, 938, 880, 1191, 1258, …
## $ h_under6_m_work           <int> 22, 16, 16, 106, 120, 161, 159, 211, 109, 10…
## $ h_under6_single_m         <int> 995, 1099, 1110, 1030, 1095, 1160, 954, 883,…
## $ h_6to17_both_work         <int> 4900, 5028, 5472, 5065, 4608, 4238, 4056, 40…
## $ h_6to17_fwork             <int> 1308, 1519, 1541, 1965, 1963, 1978, 2073, 20…
## $ h_6to17_mwork             <int> 114, 92, 113, 246, 284, 354, 373, 551, 322, …
## $ h_6to17_single_m          <int> 1966, 2305, 2377, 2299, 2644, 2522, 2269, 21…
## $ emp_m                     <dbl> 27.40, 29.54, 29.33, 31.17, 32.13, 31.74, 32…
## $ memp_m                    <dbl> 24.41, 26.07, 25.94, 26.97, 28.59, 27.44, 28…
## $ femp_m                    <dbl> 30.68, 33.40, 33.06, 35.96, 36.09, 36.61, 37…
## $ emp_service               <dbl> 17.06, 15.81, 16.92, 16.18, 16.09, 16.72, 16…
## $ memp_service              <dbl> 15.53, 14.16, 15.09, 14.21, 14.71, 13.92, 13…
## $ femp_service              <dbl> 18.75, 17.64, 18.93, 18.42, 17.63, 19.89, 20…
## $ emp_sales                 <dbl> 29.11, 28.75, 29.07, 27.56, 28.39, 27.22, 25…
## $ memp_sales                <dbl> 15.97, 17.51, 17.82, 17.74, 17.79, 17.38, 15…
## $ femp_sales                <dbl> 43.52, 41.25, 41.43, 38.76, 40.26, 38.36, 36…
## $ emp_n                     <dbl> 13.21, 11.89, 11.57, 10.72, 9.02, 9.27, 9.38…
## $ memp_n                    <dbl> 22.54, 20.30, 19.86, 18.28, 16.03, 16.79, 17…
## $ femp_n                    <dbl> 2.99, 2.52, 2.45, 2.09, 1.19, 0.77, 0.58, 0.…
## $ emp_p                     <dbl> 13.22, 14.02, 13.11, 14.38, 14.37, 15.04, 16…
## $ memp_p                    <dbl> 21.55, 21.96, 21.28, 22.80, 22.88, 24.48, 24…
## $ femp_p                    <dbl> 4.07, 5.19, 4.13, 4.77, 4.84, 4.36, 6.07, 7.…
## $ mcsa                      <dbl> 80.92, 83.42, 85.92, 88.43, 90.93, 93.43, 95…
## $ mfccsa                    <dbl> 81.40, 85.68, 89.96, 94.25, 98.53, 102.82, 1…
## $ mc_infant                 <dbl> 104.95, 105.11, 105.28, 105.45, 105.61, 105.…
## $ mc_toddler                <dbl> 104.95, 105.11, 105.28, 105.45, 105.61, 105.…
## $ mc_preschool              <dbl> 85.92, 87.59, 89.26, 90.93, 92.60, 94.27, 95…
## $ mfcc_infant               <dbl> 83.45, 87.39, 91.33, 95.28, 99.22, 103.16, 1…
## $ mfcc_toddler              <dbl> 83.45, 87.39, 91.33, 95.28, 99.22, 103.16, 1…
## $ mfcc_preschool            <dbl> 81.40, 85.68, 89.96, 94.25, 98.53, 102.82, 1…

####Now that the data is loaded, I want to check to see if there are any constants that need to be remove so these do not become a problem when we are trying to graph something.

JUST KIDDING…

####Apparently I copied the “raw” dataset that needs to be cleaned and I couldn’t figure out why it wasn’t working, I tried to filter figure it out but all of the suggestions I found were not working, here is a screenshot of the message.

####I am fairly certain that I wasn’t mapping it correctly.

Error

So I loaded the data the sure way to save time.

I just realized can add a picture from anywhere on my computer by coding the full file path. Sweet.

###Let’s continue with our goal

I am going to graph and animate the prices ranges of family-run daycare over time ggplot.

# creating a scatter plot 
p <- ggplot(childcare_costs, aes(x = study_year, y =  mfcc_preschool, color = study_year)) + # This is only counting family-run day care centers, but I thought that would be long title
  geom_point(size = 3) + #allows us to choose the size of the point. 
  scale_x_continuous(breaks = 2006:2020, labels = 2006:2020) + # this allows us to specify the tick values on the x-axis.
    scale_y_continuous(labels = scales::dollar_format())+
   # We need to see the symbol for money
  labs(title = 'Childcare Price Over Time', x = 'Year', y = 'Price of Family Daycare (per week)') + #labels axis and title
  theme_minimal() # setting theme

p

p_anim <- p +
  transition_states(study_year, transition_length = 2, state_length = 1) +
  labs(title = 'Year: {closest_state}') # transition_states allows us to animate the plot depending on what you want to base the color change off of. 

animate(p_anim) # display the animation